MapReduce Frameworks: Comparing Hadoop and HPCC

نویسندگان

  • Fabian Fier
  • Eva Höfer
  • Johann-Christoph Freytag
چکیده

MapReduce and Hadoop are often used synonymously. For optimal runtime performance, Hadoop users have to consider various implementation details and configuration parameters. When conducting performance experiments with Hadoop on different algorithms, it is hard to choose a set of such implementation optimizations and configuration options which is fair to all algorithms. By fair we mean default configurations and automatic optimizations provided by the execution system which ideally do not require manual intervention. HPCC is a promising alternative open source implementation of MapReduce. We show that HPCC provides sensible default configuration values allowing for fairer experimental comparisons. On the other hand, we show that HPCC users still have to consider implementing optimizations known from Hadoop.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

HPCC Systems : Data Intensive Supercomputing Solutions

Executive Summary As a result of the continuing information explosion, many organizations are drowning in data and the resulting " data gap " or inability to process this information and use it effectively is increasing at an alarming rate. Data-intensive computing represents a new computing paradigm which can address the data gap using scalable parallel processing and allow government and comm...

متن کامل

Towards a next generation of scientific computing in the Cloud

More than ever, designing new types of highly scalable data intensive computing is needed to qualify the new generation of scientific computing and analytics effectively perform complex tasks on massive amounts of data such as clustering, matrix computation, data mining, information extraction ... etc. MapReduce, put forward by Google, is a well-known model for programming commodity computer cl...

متن کامل

Adaptive Load Balancing in MapReduce using Flubber

MapReduce has emerged as a successful framework for addressing the heavy demand for large-scale analytical data processing, in this peta-byte age. However, while on one hand the sheer size of data makes problems more challenging, the flexibility offered by the MapReduce frameworks on the other hand, makes the learning curve far steeper than expected. The general idea behind a MapReduce framewor...

متن کامل

Benchmarking and Performance studies of MapReduce / Hadoop Framework on Blue Waters Supercomputer

MapReduce is an emerging and widely used programming model for large-scale data parallel applications that require to process large amount of raw data. There are several implementations of MapReduce framework, among which Apache Hadoop is the most commonly used and open source implementaion. These frameworks are rarely deployed on supercomputers as massive as Blue Waters. We want to evaluate ho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016